tokens = torch.rand((16,32,152))
show_pca_point_cloud(tokens) # default, no lines connecting dotsviz
Originally written for https://github.com/zqevans/audio-diffusion/blob/main/viz/viz.py
embeddings_table
embeddings_table (tokens)
make a table of embeddings for use with wandb
proj_pca
proj_pca (tokens, proj_dims=3)
this projects via PCA, grabbing the first 3 dimensions
3D Scatter plots
To visualize point clouds in notebook and on WandB:pca_point_cloud
pca_point_cloud (tokens, color_scheme='batch', output_type='wandbobj', mode='markers', size=3, line={'color': 'rgba(10,10,10,0.01)'})
returns a 3D point cloud of the tokens using PCA
| Type | Default | Details | |
|---|---|---|---|
| tokens | embeddings / latent vectors. shape = (b, d, n) | ||
| color_scheme | str | batch | ‘batch’: group by sample, otherwise color sequentially |
| output_type | str | wandbobj | plotly | points | wandbobj. NOTE: WandB can do ‘plotly’ directly! |
| mode | str | markers | plotly scatter mode. ‘lines+markers’ or ‘markers’ |
| size | int | 3 | size of the dots |
| line | dict | {‘color’: ‘rgba(10,10,10,0.01)’} | if mode=‘lines+markers’, plotly line specifier. cf. https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.scatter3d.html#plotly.graph_objects.scatter3d.Line |
To display in the notebook (and the online documenation), we need a bit of extra code:
show_pca_point_cloud
show_pca_point_cloud (tokens, color_scheme='batch', mode='markers', line={'color': 'rgba(10,10,10,0.01)'})
display a 3d scatter plot of tokens in notebook
setup_plotly
setup_plotly (nbdev=True)
Plotly is already ‘setup’ on colab, but on regular Jupyter notebooks we need to do a couple things
on_colab
on_colab ()
Returns true if code is being executed on Colab, false otherwise
Test the point cloud viz inside a notebook:
Or we can add lines connecting the dots, such as a faint gray line:
show_pca_point_cloud(tokens, mode='lines+markers')Print audio info
print_stats
print_stats (waveform, sample_rate=None, src=None, print=<built-in function print>)
print stats about waveform. Taken verbatim from pytorch docs.
Testing that:
audio_filename = 'examples/example.wav'
waveform = load_audio(audio_filename)
print_stats(waveform)Resampling examples/example.wav from 44100 Hz to 48000 Hz
Shape: (1, 55728)
Dtype: torch.float32
- Max: 0.647
- Min: -0.647
- Mean: 0.000
- Std Dev: 0.075
tensor([[-3.0239e-04, -3.8517e-04, -6.0043e-04, ..., 2.4789e-05,
-1.3458e-04, -8.0428e-06]])
Spectrograms
mel_spectrogram
mel_spectrogram (waveform, power=2.0, sample_rate=48000, db=False, n_fft=1024, n_mels=128, debug=False)
calculates data array for mel spectrogram (in however many channels)
spectrogram_image
spectrogram_image (spec, title=None, ylabel='freq_bin', aspect='auto', xmax=None, db_range=[35, 120], justimage=False)
Modified from PyTorch tutorial https://pytorch.org/tutorials/beginner/audio_feature_extractions_tutorial.html
audio_spectrogram_image
audio_spectrogram_image (waveform, power=2.0, sample_rate=48000, print=<built-in function print>, db=False, db_range=[35, 120], justimage=False, log=False)
Wrapper for calling above two routines at once, does Mel scale; Modified from PyTorch tutorial https://pytorch.org/tutorials/beginner/audio_feature_extractions_tutorial.html
Let’s test the above routine:
spec_graph = audio_spectrogram_image(waveform, justimage=False, db=False, db_range=[-60,20])
display(spec_graph)/Users/shawley/opt/anaconda3/envs/shazbot/lib/python3.8/site-packages/torchaudio/functional/functional.py:571: UserWarning:
At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (513) may be set too low.

‘Playable Spectrograms’
Source(s): Original code by Scott Condron (@scottire) of Weights and Biases, edited by @drscotthawley
cf. @scottire’s original code here: https://gist.github.com/scottire/a8e5b74efca37945c0f1b0670761d568
and Morgan McGuire’s edit here; https://github.com/morganmcg1/wandb_spectrogram
int(np.random.rand()*10000)8890
playable_spectrogram
playable_spectrogram (waveform, sample_rate=48000, specs:str='all', layout:str='row', height=170, width=400, cmap='viridis', output_type='wandb', debug=True)
Takes a tensor input and returns a [wandb.]HTML object with spectrograms of the audio specs : “all_specs”, spetrograms only “all”, all plots “melspec”, melspectrogram only “spec”, spectrogram only “wave_mel”, waveform and melspectrogram only “waveform”, waveform only, equivalent to wandb.Audio object
Limitations: spectrograms show channel 0 only (i.e., mono)
| Type | Default | Details | |
|---|---|---|---|
| waveform | audio, PyTorch tensor | ||
| sample_rate | int | 48000 | sample rate in Hz |
| specs | str | all | see docstring below |
| layout | str | row | ‘row’ or ‘grid’ |
| height | int | 170 | height of spectrogram image |
| width | int | 400 | width of spectrogram image |
| cmap | str | viridis | colormap string for Holoviews, see https://holoviews.org/user_guide/Colormaps.html |
| output_type | str | wandb | ‘wandb’, ‘html_file’, ‘live’: use live for notebooks |
| debug | bool | True | flag for internal print statements |
generate_melspec
generate_melspec (audio_data, sample_rate=48000, power=2.0, n_fft=1024, win_length=None, hop_length=None, n_mels=128)
helper routine for playable_spectrogram
Sample usage with WandB:
wandb.init(project='audio_test')
wandb.log({"playable_spectrograms": playable_spectrogram(waveform)})
wandb.finish()See example result at https://wandb.ai/drscotthawley/playable_spectrogram_test/
Test the playable spectrogram:
HTML(playable_spectrogram(waveform, output_type='html_file'))Let’s show off the multichannel waveform display:
mc_wave = load_audio('examples/stereo_pewpew.mp3')
playable_spectrogram(mc_wave, specs='wave_mel', output_type='live')Resampling examples/stereo_pewpew.mp3 from 22050 Hz to 48000 Hz